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Question 1: Data Cube (10 Points) 
A chain of bicycle stores wants to analyze its sales data. Analysis should be conducted on the 
granularity of store x brand x customer. Analysis should also be conducted on a regional level 
and analytic goals are number of sales, turnover, and turnover per sale. Each store, brand, 
and customer can be identified with a unique ID. Additional information like regional data 
for store and customer are available. In total, there are 100 stores in different cities, 10.000 
customers that can shop in every store, and 20 brands. 


a) Describe structure and properties of the data cube that results from modeling the 
analysis task above. (3 points) 


b) How is the data cube represented using the MOLAP approach. Which problem arises 
when using a naive approach. (2 points) 
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c) Describe the general structure of the corresponding star schema for the analysis task 
above. What are the maximum tuple counts per table? (2 points) 


d) Which operation can be applied on the data cube to conduct analytic tasks on the 
regional aggregation level of the stores. What happens to the data cube of la) when 
this function is applied? Which property does the data have to fulfill in order for this 
operation to be applicable? (3 points) 
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Question 2: Multidimensional Modeling (8 Points) 
The following schema represents an online shop where customers can buy products. Customers 
and products are identified by their names. Each product has a base price for one unit. Each 
customer has an address and a customer specific discount. An order is identified by its date, 
the customer, and a single ordered product. The price of an order is calculated using the 
order’s quantity (qty), the product’s base price, and the customer’s discount. A customer can 
submit a maximum of one rating for a product. 


customer | address | discount | product | base price date | qty | price | rating r_text 
Alice | A Town 5% | Battery 1.00 | 09.03.21 4 | 3.80 5 | Long lasting. 
Alice | A Town 5% | Battery 1.00 | 16.06.21 8 | 7.60 5 | Long lasting. 
Alice | A Town 5% Cable 5.00 | 09.03.22 1 | 4.75 - - 
Bob | B Town 1% Cable 5.00 | 03.12.21 2| 9.90 4 Very good. 

Bob | B Town 1% | Battery 1.00 | 22.01.22 4 | 3.96 1 | Poor quality. 


a) Completely normalize the given unity table into a new schema. (6 points) 
Describe the resulting schema including: 
e relations with their attributes and primary keys 


e foreign key relation ships 


b) Which type of multidimensional schema is the result of 2a). Explain your answer. 
(2 points) 
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Question 3: Multidimensional Queries (11 Points) 
The following excerpt of a table named energy models the energy consumption of cities. 
Additionally, the state and the country of the city are available. 


energy: 


: | country | state | city | consumption | ise 


a) Express the analyses of the cumulative energy consumption by city, state, & country, by 
state & country, by country, AND the total consumption (overall) in SQL WITHOUT 
using OLAP extensions. (4 points) 


b) Express the analysis task from 3a) using the rollup ()-clause. What has to be considered 
when using the rollup()-clause? (3 points) 
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c) Is the cube()-clause suited to express the analysis task from 3a)? Explain your answer. 
(2 points) 


d) Assume there are NULL-values in the state column (cities without state). What problem 
does this cause? Which SQL clause can help solving this problem? (2 points) 
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Question 4: Reporting Functions (12 Points) 
The following schema describes an order processing system. An order consists of entries which 
contain the quantity of a certain product. Every order is assigned to one customer and each 
customer is assigned to one nation. 


Solve the following tasks using multidimensional SQL queries. 


nation customer 


n-name c-name orderid 
n-code address customer 
nation date 
total_price 


-name order 
part-price position 


product 
quantity 


price 


a) List all parts (names) with their sold quantity and rank them in descending order. 
(4 points) 
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b) For each product, list all countries with customers for that product and rank the coun- 
tries in descending order. (5 points) 


c) Explain which information the following query is generating (3 points). 


SELECT year, quarter, sales, sales/sum(sales) over(PARTITION BY year) 
FROM ( 
SELECT extract(year FROM date) as year, 
extract (quarter FROM date) as quarter, 
count (*) as sales 


FROM order 
GROUP BY year, quarter 
) as r 
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Question 5: Schema Matching (12 Points) 
Match the attributes of the following two schemas. Use the stable marriage algorithm with 
trigram similarity for scoring. 


$1 S2 


[ota ets} pice is | | 


a) Find the trigram representation for each attribute. (3 points) 


b) Apply the stable marriage algorithm. Use the trigram representations from 5a) as pri- 
mary and alphabetical order as secondary preference mechanism. The attributes of S1 
propose to those of $2. Report the trigram scores (use the table below), the preference 
lists, and every step of the stable marriage algorithm. (9 points) 


_ [~ r l 
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Question 6: Time Series Data (7 Points) 
Given is a time series with a season length of s = 3 and a linear trend component. t marks 
the time and y marks the time series values. 
a Re A 
oF Go a2 Or I A i PR Sass 2a ae 


a) Extract the trend component of the given time series using classical decomposition. 
(3 points) 
xp Wie ee | BE Be ok a | eae il SOs I ey eB. 1] 38 
trend | TN | | (| | Ge | | | 


b) Detrend the given time series using classical decomposition and the extracted trend 
component from 6a). (1 point) 

So | | 2s (a 2a | = | 

demend] | EE EE 


c) Extract the season component of the detrended time series. (1 point) 
ee es | | 


season | | | 


d) Apply an autoregressive model with p=2 (AR2) to the given time series. Use the para- 
meters 61 = ¢2 = 0.5. (2 points) 

to {1 | 2 {3 | 4 {5 | 6 |] 7] 8 | 9 | 

AB tT 
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Question 7: Training of Neural Networks (12 Points) 


a) Build the truth table for the following logical function. (2 points) 


1, ifa, =1 and a =1, 


0, else. 


t= g(x1, £2) = 


LY x2 t 


b) Train a step-activation perceptron to model the function g(x1, 22) with the perceptron 
training algorithm. Use wi; = 0.2 and wa, = 1.1 as initial weights and a = 0.1 as 
learning rate. Report your calculations and the final weights w 1; and wg. (10 points) 
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Question 8: Word Embedding Training (7 Points) 
Apply the naive training algorithm to train a word-embedding model on the following context: 


e context window size: 4 (2 words before + 2 words after target) 
e Word Embedding dimensionality: 100 
e hidden layer size: 89 


e total number of words in vocabulary: 20k 


a) First, perform the embedding lookup. Here, from the context words a one-hot-matrix 
M is constructed and multiplied from left with the Embedding Matrix E. The context 
words of the current target word are c, g, a, and e. Fill in the corresponding matrix M. 
Mark all fields with a value different from 0, all empty fields will be assumed to show 
0-values. (4 points) 


E 


o|@ joa} o 
tbh 


x 
memreoeadaeae ys 
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b) After the embedding lookup and reshaping, we perform a classification for which we use 
a neural network as shown below (first layer(left) is input, third layer(right) is output). 
After the reshaping of 8a), how many neurons are necessary in each layer? (3 points) 


ist 2nd 3rd 
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Question 9: Activation Functions (4 Points) 
Two activation functions (softmax and ReLU) are given below: 

e*i 
> oe. 


Both functions are applied separately on the same vector z with 30.000 elements. 


softmax(z); ReLU(z;) = mazx(0, z;) 


a) Describe the effect of each activation function on the input vector. (2 points) 


b) Which activation function is faster, explain your answer. (2 point) 
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Question 10: Data Analysis (7 Points) 
Assume the following scenario: The manager of a company notices that some products receive 
negative reviews and wants to know why. A data analyst discovers a large amount of negative 
reviews where customers complain about defective products and reports this back to the 


manager. Now, the manager asks the data science team to find out how many products 
receive a bad review due to defects. 


The data science team comes up with the following three possible solutions for this problem: 


Relational Database Use a given table of synonyms for the word “defect” and perform 
SQL queries on negative reviews to find reviews containing at least one of the synonyms 
for “defect”. Order the result set by the amount of synonyms contained in each review. 


Word Embeddings Use word embeddings and their ability to represent semantic similarity. 


Classification Apply classification techniques to group reviews into complains about defec- 
tive product and other complaints. 


a) Modify the Relational Database solution to exploit the semantic similarity of word 
embeddings. (2 points) 


b) Describe how classification techniques could be used to determine whether the review 
complains about a defective product or not. (2 points) 
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c) List at least one advantages and one disadvantages for each of the three solutions. 


(3 points) 


Relational Database 


Word Embeddings 


Classification 


Advantage 


Disadvantage 
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